Bayesian Averaging of Classifiers and the Overfitting Problem
نویسنده
چکیده
Although Bayesian model averaging is theoretically the optimal method for combining learned models, it has seen very little use in machine learning. In this paper we study its application to combining rule sets, and compare it with bagging and partitioning, two popular but more ad hoc alternatives. Our experiments show that, surprisingly, Bayesian model averaging’s error rates are consistently higher than the other methods’. Further investigation shows this to be due to a marked tendency to overfit on the part of Bayesian model averaging, contradicting previous beliefs that it solves (or avoids) the overfitting problem.
منابع مشابه
Bayesian Conditional Gaussian Network Classifiers with Applications to Mass Spectra Classification
Classifiers based on probabilistic graphical models are very effective. In continuous domains, maximum likelihood is usually used to assess the predictions of those classifiers. When data is scarce, this can easily lead to overfitting. In any probabilistic setting, Bayesian averaging (BA) provides theoretically optimal predictions and is known to be robust to overfitting. In this work we introd...
متن کاملBayesian Model Averaging with Cross-Validated Models
Several variants of Bayesian Model Averaging (BMA) are described and evaluated on a model library of heterogeneous classifiers, and compared to other classifier combination methods. In particular, embedded cross-validation is investigated as a technique for reducing overfitting in BMA.
متن کاملSelecting One Dependency Estimators in Bayesian Network Using Different MDL Scores and Overfitting Criterion
The Averaged One Dependency Estimator (AODE) is integrated all possible Super-Parent-One-Dependency Estimators (SPODEs) and estimates class conditional probabilities by averaging them. In an AODE network some redundant SPODEs maybe result in some bias of classifiers, as a consequence, it could reduce the classification accuracy substantially. In this paper, a kind of MDL metrics is used to sele...
متن کاملCompression-Based Averaging of Selective Naive Bayes Classifiers
The naive Bayes classifier has proved to be very effective on many real data applications. Its performance usually benefits from an accurate estimation of univariate conditional probabilities and from variable selection. However, although variable selection is a desirable feature, it is prone to overfitting. In this paper, we introduce a Bayesian regularization technique to select the most prob...
متن کاملLearning Bayesian Belief Network Classifiers: Algorithms and System
This paper investigates the methods for learning Bayesian belief network (BN) based predictive models for classification. Our primary interests are in the unrestricted Bayesian network and Bayesian multi-net based classifiers. We present our algorithms for learning these classifiers and also the methods for fighting the overfitting problem. A natural method for feature subset selection is also ...
متن کامل